dataset bias
A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
Despite the remarkable success of pre-trained language models (PLMs), they still face two challenges: First, large-scale PLMs are inefficient in terms of memory footprint and computation. Second, on the downstream tasks, PLMs tend to rely on the dataset bias and struggle to generalize to out-of-distribution (OOD) data. In response to the efficiency problem, recent studies show that dense PLMs can be replaced with sparse subnetworks without hurting the performance. Such subnetworks can be found in three scenarios: 1) the fine-tuned PLMs, 2) the raw PLMs and then fine-tuned in isolation, and even inside 3) PLMs without any parameter fine-tuning. However, these results are only obtained in the in-distribution (ID) setting.
Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias
Data-driven approaches to solving robotic tasks have gained a lot of traction in recent years. However, most existing policies are trained on large-scale datasets collected in curated lab settings. If we aim to deploy these models in unstructured visual environments like people's homes, they will be unable to cope with the mismatch in data distribution. In such light, we present the first systematic effort in collecting a large dataset for robotic grasping in homes. First, to scale and parallelize data collection, we built a low cost mobile manipulator assembled for under 3K USD.
- North America > United States > Pennsylvania (0.04)
- Europe > Poland (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
Integrating Object Interaction Self-Attention and GAN-Based Debiasing for Visual Question Answering
Li, Zhifei, Qiu, Feng, Wang, Yiran, Xia, Yujing, Xiao, Kui, Zhang, Miao, Zhang, Yan
Abstract--Visual Question Answering (VQA) presents a unique challenge by requiring models to understand and reason about visual content to answer questions accurately. Existing VQA models often struggle with biases introduced by the training data, leading to over-reliance on superficial patterns and inadequate generalization to diverse questions and images. This paper presents a novel model, IOG-VQA, which integrates Object Interaction Self-Attention and GAN-Based Debiasing to enhance VQA model performance. The self-attention mechanism allows our model to capture complex interactions between objects within an image, providing a more comprehensive understanding of the visual context. Meanwhile, the GAN-based debiasing framework generates unbiased data distributions, helping the model to learn more robust and generalizable features. By leveraging these two components, IOG-VQA effectively combines visual and textual information to address the inherent biases in VQA datasets. Extensive experiments on the VQA-CP v1 and VQA-CP v2 datasets demonstrate that our model shows excellent performance compared with the existing methods, particularly in handling biased and imbalanced data distributions highlighting the importance of addressing both object interactions and dataset biases in advancing VQA tasks. Our code is available at https://github.com/HubuKG/IOG-VQA. ISUAL Question Answering (VQA) [1] is an interdisciplinary field that combines the challenges of computer vision and natural language processing to generate accurate answers to questions about images. This task requires a deep understanding of both the visual content and the contextual nuances posed by the questions, making it a complex and demanding research area. Despite significant advancements in recent years, current VQA models often struggle with biases introduced by training data [2], [3], [4], leading to an over-reliance on superficial patterns and correlations rather than genuine visual reasoning and understanding.
A Primer on Causal and Statistical Dataset Biases for Fair and Robust Image Analysis
Machine learning methods often fail when deployed in the real world. Worse still, they fail in high-stakes situations and across socially sensitive lines. These issues have a chilling effect on the adoption of machine learning methods in settings such as medical diagnosis, where they are arguably best-placed to provide benefits if safely deployed. In this primer, we introduce the causal and statistical structures which induce failure in machine learning methods for image analysis. We highlight two previously overlooked problems, which we call the \textit{no fair lunch} problem and the \textit{subgroup separability} problem. We elucidate why today's fair representation learning methods fail to adequately solve them and propose potential paths forward for the field.
- Research Report (0.64)
- Overview (0.42)